Goto

Collaborating Authors

 claim 5




From Axioms to Algorithms: Mechanized Proofs of the vNM Utility Theorem

Jingyuan, Li

arXiv.org Artificial Intelligence

This paper presents a comprehensive formalization of the von Neumann-Morgenstern (vNM) expected utility theorem using the Lean 4 interactive theorem prover. We implement the classical axioms of preference-completeness, transitivity, continuity, and independence-enabling machine-verified proofs of both the existence and uniqueness of utility representations. Our formalization captures the mathematical structure of preference relations over lotteries, verifying that preferences satisfying the vNM axioms can be represented by expected utility maximization. Our contributions include a granular implementation of the independence axiom, formally verified proofs of fundamental claims about mixture lotteries, constructive demonstrations of utility existence, and computational experiments validating the results. We prove equivalence to classical presentations while offering greater precision at decision boundaries. This formalization provides a rigorous foundation for applications in economic modeling, AI alignment, and management decision systems, bridging the gap between theoretical decision theory and computational implementation.


Consistent Submodular Maximization

Dütting, Paul, Fusco, Federico, Lattanzi, Silvio, Norouzi-Fard, Ashkan, Zadimoghaddam, Morteza

arXiv.org Machine Learning

Submodular optimization is a powerful framework for modeling and solving problems that exhibit the widespread diminishing returns property. Thanks to its effectiveness, it has been applied across diverse domains, including video analysis [Zheng et al., 2014], data summarization [Lin and Bilmes, 2011, Bairi et al., 2015], sparse reconstruction [Bach, 2010, Das and Kempe, 2011], and active learning [Golovin and Krause, 2011, Amanatidis et al., 2022]. In this paper, we focus on submodular maximization under cardinality constraints: given a submodular function f, a universe of elements V, and a cardinality constraint k, the goal is to find a set S of at most k elements that maximizes f(S). Submodular maximization under cardinality constraints is NP-hard, nevertheless efficient approximation algorithms exist for this task in both the centralized and the streaming setting [Nemhauser et al., 1978, Badanidiyuru et al., 2014, Kazemi et al., 2019]. One aspect of efficient approximation algorithms for submodular maximization that has received little attention so far, is the stability of the solution. In fact, for some of the known algorithms, even adding a single element to the universe of elements V may completely change the final output (see Appendix A for some examples). Unfortunately, this is problematic in many real-world applications where consistency is a fundamental system requirement.


Statistical Query Lower Bounds for Learning Truncated Gaussians

Diakonikolas, Ilias, Kane, Daniel M., Pittas, Thanasis, Zarifis, Nikos

arXiv.org Machine Learning

We study the problem of estimating the mean of an identity covariance Gaussian in the truncated setting, in the regime when the truncation set comes from a low-complexity family $\mathcal{C}$ of sets. Specifically, for a fixed but unknown truncation set $S \subseteq \mathbb{R}^d$, we are given access to samples from the distribution $\mathcal{N}(\boldsymbol{ \mu}, \mathbf{ I})$ truncated to the set $S$. The goal is to estimate $\boldsymbol\mu$ within accuracy $\epsilon>0$ in $\ell_2$-norm. Our main result is a Statistical Query (SQ) lower bound suggesting a super-polynomial information-computation gap for this task. In more detail, we show that the complexity of any SQ algorithm for this problem is $d^{\mathrm{poly}(1/\epsilon)}$, even when the class $\mathcal{C}$ is simple so that $\mathrm{poly}(d/\epsilon)$ samples information-theoretically suffice. Concretely, our SQ lower bound applies when $\mathcal{C}$ is a union of a bounded number of rectangles whose VC dimension and Gaussian surface are small. As a corollary of our construction, it also follows that the complexity of the previously known algorithm for this task is qualitatively best possible.


Learning from Censored and Dependent Data: The case of Linear Dynamics

Plevrakis, Orestis

arXiv.org Artificial Intelligence

Observations from dynamical systems often exhibit irregularities, such as censoring, where values are recorded only if they fall within a certain range. Censoring is ubiquitous in practice, due to saturating sensors, limit-of-detection effects, and image-frame effects. In light of recent developments on learning linear dynamical systems (LDSs), and on censored statistics with independent data, we revisit the decades-old problem of learning an LDS, from censored observations (Lee and Maddala (1985); Zeger and Brookmeyer (1986)). Here, the learner observes the state $x_t \in \mathbb{R}^d$ if and only if $x_t$ belongs to some set $S_t \subseteq \mathbb{R}^d$. We develop the first computationally and statistically efficient algorithm for learning the system, assuming only oracle access to the sets $S_t$. Our algorithm, Stochastic Online Newton with Switching Gradients, is a novel second-order method that builds on the Online Newton Step (ONS) of Hazan et al. (2007). Our Switching-Gradient scheme does not always use (stochastic) gradients of the function we want to optimize, which we call "censor-aware" function. Instead, in each iteration, it performs a simple test to decide whether to use the censor-aware, or another "censor-oblivious" function, for getting a stochastic gradient. In our analysis, we consider a "generic" Online Newton method, which uses arbitrary vectors instead of gradients, and we prove an error-bound for it. This can be used to appropriately design these vectors, leading to our Switching-Gradient scheme. This framework significantly deviates from the recent long line of works on censored statistics (e.g., Daskalakis et al. (2018); Kontonis et al. (2019); Daskalakis et al. (2019)), which apply Stochastic Gradient Descent (SGD), and their analysis reduces to establishing conditions for off-the-shelf SGD-bounds.


Exploration with Limited Memory: Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-Armed Bandits

Assadi, Sepehr, Wang, Chen

arXiv.org Artificial Intelligence

Consider the following abstract coin tossing problem: Given a set of $n$ coins with unknown biases, find the most biased coin using a minimal number of coin tosses. This is a common abstraction of various exploration problems in theoretical computer science and machine learning and has been studied extensively over the years. In particular, algorithms with optimal sample complexity (number of coin tosses) have been known for this problem for quite some time. Motivated by applications to processing massive datasets, we study the space complexity of solving this problem with optimal number of coin tosses in the streaming model. In this model, the coins are arriving one by one and the algorithm is only allowed to store a limited number of coins at any point -- any coin not present in the memory is lost and can no longer be tossed or compared to arriving coins. Prior algorithms for the coin tossing problem with optimal sample complexity are based on iterative elimination of coins which inherently require storing all the coins, leading to memory-inefficient streaming algorithms. We remedy this state-of-affairs by presenting a series of improved streaming algorithms for this problem: we start with a simple algorithm which require storing only $O(\log{n})$ coins and then iteratively refine it further and further, leading to algorithms with $O(\log\log{(n)})$ memory, $O(\log^*{(n)})$ memory, and finally a one that only stores a single extra coin in memory -- the same exact space needed to just store the best coin throughout the stream. Furthermore, we extend our algorithms to the problem of finding the $k$ most biased coins as well as other exploration problems such as finding top-$k$ elements using noisy comparisons or finding an $\epsilon$-best arm in stochastic multi-armed bandits, and obtain efficient streaming algorithms for these problems.


FriendlyCore: Practical Differentially Private Aggregation

Tsfadia, Eliad, Cohen, Edith, Kaplan, Haim, Mansour, Yishay, Stemmer, Uri

arXiv.org Artificial Intelligence

Metric aggregation tasks are at the heart of data analysis. Common tasks include averaging, k-clustering, and learning a mixture of distributions. When the data points are sensitive information, corresponding for example to records or activities of particular users, we would like the aggregation to be private. The most widely accepted solution to individual privacy is differential privacy (DP) [DMNS06] that limits the effect that each data point can have on the outcome of the computation. Differentially private algorithms, however, tend to be less accurate and practical than their non-private counterparts. This degradation in accuracy can be attributed, to a large extent, to the fact that the requirement of differential privacy is a worst-case kind of a requirement. To illustrate this point, consider the task of privately learning mixture of Gaussians.


A Sublinear Adversarial Training Algorithm

Gao, Yeqi, Qin, Lianke, Song, Zhao, Wang, Yitan

arXiv.org Artificial Intelligence

Adversarial training is a widely used strategy for making neural networks resistant to adversarial perturbations. For a neural network of width $m$, $n$ input training data in $d$ dimension, it takes $\Omega(mnd)$ time cost per training iteration for the forward and backward computation. In this paper we analyze the convergence guarantee of adversarial training procedure on a two-layer neural network with shifted ReLU activation, and shows that only $o(m)$ neurons will be activated for each input data per iteration. Furthermore, we develop an algorithm for adversarial training with time cost $o(m n d)$ per iteration by applying half-space reporting data structure.


Sparsification for Sums of Exponentials and its Algorithmic Applications

Li, Jerry, Liu, Allen, Moitra, Ankur

arXiv.org Machine Learning

Many works in signal processing and learning theory operate under the assumption that the underlying model is simple, e.g. that a signal is approximately $k$-Fourier-sparse or that a distribution can be approximated by a mixture model that has at most $k$ components. However the problem of fitting the parameters of such a model becomes more challenging when the frequencies/components are too close together. In this work we introduce new methods for sparsifying sums of exponentials and give various algorithmic applications. First we study Fourier-sparse interpolation without a frequency gap, where Chen et al. gave an algorithm for finding an $\epsilon$-approximate solution which uses $k' = \mbox{poly}(k, \log 1/\epsilon)$ frequencies. Second, we study learning Gaussian mixture models in one dimension without a separation condition. Kernel density estimators give an $\epsilon$-approximation that uses $k' = O(k/\epsilon^2)$ components. These methods both output models that are much more complex than what we started out with. We show how to post-process to reduce the number of frequencies/components down to $k' = \widetilde{O}(k)$, which is optimal up to logarithmic factors. Moreover we give applications to model selection. In particular, we give the first algorithms for approximately (and robustly) determining the number of components in a Gaussian mixture model that work without a separation condition.